Most people think that internet content comprises the websites they find via search engines, but so much of the internet is hidden from below the surface. What most people consider the world wide web is composed of just a fraction percent of the entire web content.

The web can be divided into three main layers: surface, deep and darknet. OSINT investigators should be familiar with the content of each web layer so they know what information can be found, what security risks are present and decide the best methods to use to find their information. 

The meaning of surface web

The surface web includes all web content accessible via search engines like Google, Yahoo and Bing. Search engines use web crawlers or spiders to scan web pages, index their content and make them searchable. The spider is an automated program used by search engines to crawl the internet, discover new web pages and add them to their index database. Spiders analyze key text on the webpage and metadata information such as meta titles and meta descriptions in order to add them to a massive index database. When a user queries a search engine using specific search keywords, the search engine will search within its index database and retrieve relevant web pages matching the user's query. 

There are various search engines and online services for locating information on the surface web; the most popular ones include:

General purpose search engines:

There are local or national search engines that are dedicated to searching within the websites of a particular country or language:

  • Search  - Switzerland
  • Baidu – China
  • Goo  - Japan
  • Google – Google search engine can be configured to return results in a particular language only. To apply this filter, follow these steps:
  • On your computer, open Search settings
  • On the left, click Languages.
  • Under Results Language Filter, click Edit.
  • Select your preferred languages
  • Metasearch engines – These engines query multiple search engines and aggregate the results according to their relevance to user search queries. Meta searches reduce the time needed to query various search engines at once. MetaGer and Excite are two examples.
  • Google advanced search operators, or Google Dorks, can be leveraged to find hard-to-find content on the surface web. Artificial Intelligence, such as ChatGPT, can be used to create customized Google dorks that match our search needs quickly. DorkGPT is a free online service for generating Google dorks using AI

What is the deep web and how can you access it

This is the most significant portion of the web. Some studies estimate its size to be around %96 of web content. The deep web contains all contents that conventional search engines cannot index or discover. Many people confuse the terms "deep web" and "dark web" and use them interchangeably; however, they are each distinct. Content on the dark web is intentionally hidden for one reason or another, while content on the deep web is merely inaccessible. 

Deep web content is rarely as nefarious as it may sound. To locate information on the deep web, a user needs to execute a search query or enter the exact URL of the online resources in the web browser address bar. For example, when accessing your account on social media platforms or checking your online banking account, you are accessing deep web content. In addition, deep web contents include all the following:

  • Website residing behind a paywall – such as media websites providing premium streaming services, such as Netflix, and commercial magazines that require user's payment to access it, such as Janes magazine
  • Grey information, which can include the following: Academic papers, preprints, proceedings, conference and discussion papers, research reports, unpublished research papers, marketing reports, newsletters, technical specifications and standards, dissertations, theses, trade publications, memoranda, government reports, documents not published commercially, translations, newsletters, market surveys, or a draft version of books and articles
  • Government databases include vital records (birth, marriage and death records), criminal and court records, property and tax records, voter election databases and immigration and customs records
  • Private or closed online communities, such as discussion forums and closed Telegram groups
  • Email services, such as Gmail and Outlook, and messages on internet messaging and collaboration apps like Slack or Discord
  • Cloud storage accounts that require a login and private intranet 
  • Leaked information websites – for example, those which are specialized in storing leaked information, such as Pastebin and some file-sharing websites 
  • Any content that has been labeled not to be indexed by search engine crawlers by a developer using a site's robots.txt file

Like the surface web, deep web content does not require using any software program or particular configuration to access it. We can access its contents using web browsers over HTTP/HTTPS. 

When searching within a website holding deep web content, a user typically uses its internal "search form" functionality to execute direct queries to find information buried in its database. Below are some popular deep websites and the type of information we can expect to find by searching them.

Theoretically, all internet users access deep web content as a part of their daily internet usage routine. For instance, checking your email or accessing your social media account on Facebook, Twitter or LinkedIn gives you access to deep web content.

How the dark web is used

Also known as the darknet. This is only a small segment of the internet that requires using specific software to access it. It is estimated that the darknet constitutes less than 1% of the entire web. Content on the darknet is explicitly hidden, and it is most often associated with contraband and illegal content.

Darknet has a bad reputation as being a place for criminals to exchange illegal products and services. This perception is due to the underground marketplaces there, such as the infamous Silk Road and AlphaBay. While the dark web does have many legitimate uses, such as bringing news to dissidents in authoritarian regimes, there are still many illegal and dangerous things that can be found on the darknet, due to its purported anonymity. 

For law enforcement officials and other investigators, some commonly sold illicit items on the darknet include:

  • Fake official documents - such as passports and driving licenses
  • Firearms dealers
  • Selling stolen credit cards and personal information
  • Malware of sale and ready-to-launch cyberattacks, such as zero-days and Distributed Denial of Service (DDoS) and ransomware attacks
  • Drugs
  • Sex trafficking
  • Terrorist organizations also use the darknet to recruit and as a secure communication channel away from authorities  

Despite the bad reputation of the darknet, there are many parties that use it for legal purposes, such as:

  • Journalists
  • Whistleblowers
  • Privacy enthusiasts who want to prevent external observers from capturing their online activities and circumvent censorship 
  • Political activists in oppressive regimes want to hide their online identity

The darknet is not one single network. For instance, many people may be familiar with of The Onion Router, or Tor, network, but this is not the only dark web. There are several darknets, each requiring its software program or particular web browser configuration to access it. You can think of a darknet as an isolated network that resides somewhere online, and you need to configure your web browser or install a specific application to access it. Here are the most well-known networks:

  • Tor: This is the most widely known darknet network and could be the largest one. TOR uses the onion routing to conceal users IP addresses. The best method to access Tor is to use Tor Browser. This browser can be used to surf surface websites anonymously in addition to accessing Tor websites, also known as Tor services, and has the "onion" extension. 
  • The Invisible Internet Project (I2P): I2P is the second well-known darknet network. I2P allows its users to browse the surface web anonymously. They can also create websites (known as hidden services) and host online services on the I2P network without revealing their physical location. The hidden services are only accessible via the I2P network. 
  • Other darknets include Freenet and Zeronet

Tor is the most widely known among internet users and is relatively easy to access, with some security caveats in mind. The following are some onion services to start your search across the Tor network.

  • Ahmia - Tor search engine
  • TORCH - another Tor search engine that claims indexing 1.1 million pages
  • OnionLand - Tor search engine
  • TorDex - Tor search engine and directory
  • The Hidden Wiki - a directory of popular Tor websites
  • The Facebook Social Network on Tor

How to safely access the dark web

The web is not limited to what we find when using typical search engines; it is much broader. OSINT investigators should know the three layers composing the web and understand how each layer could be searched. This knowledge is essential for planning their search activities and knowing which online services to use within each web layer.

But accessing the dark web can come with significant risks to the researcher. While darknets attract users because they tout anonymity and privacy, tracking mechanisms still can and do occur. On Tor, in particular, the most popular of the darknets, traffic is diverted through multiple nodes to deliver multi-layered encryption to the users. But a weakness in the exit node presents a vulnerable unencrypted area that can expose investigators and present an attack surface to bad actors.

For researchers to safely access the dark and deep web, without compromising their hardware, network or personal safety, they should consider a managed attribution platform to protect anonymity and isolate their browser instance.

Learn more about accessing the dark web with managed attribution.

Tags
Anonymous research Cryptocurrency Dark web research Financial crime Fraud and brand misuse Phishing/malware Threat intelligence